


Institutional Archive of the Naval Postgraduate School 


Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1983 


Evaluator bias in the Marine corps combat 
readiness evaluation system (MCCRRES) its 
identification and control. 


Wheeler, George Montague 


Monterey, California. Naval Postgraduate School 


http://ndl.handle.net/10945/19796 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


| (8 D U DLEY research materials and institutional publications created by the NPS community. 
«ist sha Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NY KNOX appointed — and published -- scholarly author. 

ia) LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


sf 








Dudley Knox Library, Nro 


Monterey, CA 9594: 




















NAVAL POSTGRAQUATE SCHOOL 


Monterey, Galifornia 





(Ae otS 


EVALUATOR BIAS IN THE MARINE CORPS COMBAT 
READINESS EVALUATION SYTEM (MCCRES) 
[TS IDENTIFICATION AND CONTROL 
Dy 


George M. Wheeler 


June, 1985 


Kenneth Euske 
Thesis Co-Advisor Joseph Mullane 


Approved for public release; distribution unlimited 


T208964 








v ae505 r 
~ @ 


SECURITY CLASSIFICATION OF THIS PAGE (When Deta Entered) 
READ INSTRUCTIONS 


ii. REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER 


4. TITLE (and Subtitie) 
Peeeuiation Bias in the Marine Corps 
Combat Readiness Evaluation System 

(MCCRES) Its Identification and Control 











4 








5. TYPE OF REPORT 4& PERIOD COVERED 


Master's Thesis 


aleuoss po Stor 
PERFORMING ORG. REPORT NUMBER 










6. 






8. CONTRACT OR GRANT NUMBER(@) 







7. AUTAHOR(e) 


Geomee M. Wheeler 





10. PROGRAM ELEMENT, PROJECT, TASK 
AREA & WORK UNIT NUMBERS 





9. PERFORMING ORGANIZATION NAME ANO ADORESS 
| Naval Postgraduate School 
Monterey, California 93940 















12. REPORT OATE 
UMC e.5 


13. NUMBER OF PAGES 
64 


. MONITORING AGENCY NAME & ADORESS(I/ different from Controlling Offices) {| 15. SECURITY CLASS. (of thrs report) 


UNCL ASSO! FLED 


‘Sa. OECLASSIFICATION/ DOWNGRADING 
SCHEDULE 


1}. CONTROLLING OFFICE NAME ANO ADORESS 
Naval Postgraduate School 
Monterey, California 93940 






114. DISTRIBUTION STATEMENT (of thle Report) 
iemmewee LOr public release; distribution unlimited 


17. OISTRIBUTION STATEMENT (of the ebsetrect entered in Block 20, If different trom Report) 


18. SUPPLEMENTARY NOTES 


19. KEY WOROS (Continue on reveree eide if neceesary and identify by block number) 


Combat Readiness Evaluation, MCCRES, 


Bias, 





20. ABSTRACT (Continue an reveree side if neceseary and identify by block number) 
The Marine Corps Combat Readiness Evaluation System (MCCRES) was 
designed to provide timely and accurate information concerning th¢ 
ability of active and reserve forces to carry out assigned combat 
missions. To provide this information, units are subjected to 
Simulated combat problems and their performance is observed by 
expert evaluators from within the Marine Corps. Though these 


reuaeon Searc sconsiderecd experts in their fields, they may 
ieect bias into thei > joe) Causine an inae@eurate (CONT 


DO ee 1473 = CO TION OF 1 NOV 65 1S OBSOLETE 


$/N 0102+ LF- 014-6601 1 SECURITY CLASSIFICATION OF THIS PAGE (When Dele Enterec’ 





| SECURITY CLASSIFICATION OF THIS PAGE (When Dete Entered) 








Abstract (Continued) Block # 20 
combat readiness rating for the unit observed. 


Analysis of the MCCRES reveals three main areas where evaluator 
bias may appear: senior evaluator influence, other evaluator 
bias and interpretation of the mission performance standards 
used to conduct the evaluation. To alleviate these problems, 
three actions are explored: evaluator training, evaluator testing! 
and quantification of the mission performance standards. 





3’N 0102- LF. 014-6601 


SL 
2 SECURITY CLASSIFICATION OF THIS PAGE(When Data Entered) 





Approved for public release; distributicn unlimited. 


Evaluator Bias in the 
Marine Corps Combat Readiness Evaluation > abide (MCCRES) 
ms*® identification and Contro 


by 


George M. Wheeler 
DUS eee United States Marine Corps 
B.S.A-5., United States Naval Academy, 1976 


Submitted in partial fulfiilaent of the 
requirements for «tne degree of 


Dion Soe eee Vem EN LNFORMATION SYSTEMS 


c+ 


==on che 


NAVAL POSTGRAD 


UA tees CHOOL 
June 1983 


WUZYe 
ara 





Dudley Knox Librar 
Monterey, CA 93043" a 


ABSTRACT 


The Marine Corps Combat Readiness Evaluation System 
(NCCRES) was designed to provide timely and accurate infor- 
maticn ccncéining the ability of active and reserve forces 
to carry out assigned combat missions. foo p2oOvide Ve hes 
information, units are subjected to simulated combat prob- 
lems and th¢ir performance is cbserved by expert evaluators 
from within the Marine Corps. Though these evaluators are 
Considered experts in their fields, they may inject bias 
into their evaluaticns causing an inaccurate combat readi- 


ness rating for the unl* observed. 


Meysls 5. | edewreerk so teveais thirst main areas where 
evaluatcr Flas may appears: senior evaluator influence, 
G@em-=2 cValtiatoOrtr bias and icntertpretation of the mission 


n 
BPemecrmance standarcs used tc conduct the 2a 
e 


Vig age sone). LO 
alleviate these probiems, «hree actions are explored: evalu- 
Peg@emtteining,  2valwlator testing and quantification of the 


Mi2ssicn performance standards. 


ise 


TABLE CP CONTENTS 


PNDPRODUCT TION sam 2 s+ « « + « 


A. 
ieee 
Cis 


Bie OoO eel ss. ~« « « « 
EACK GROUNE @ @ oe @ @ = es es 
SCOPE AND METHODOLOGY .. 


EVALUATION @ & * @ a . @ a @ @ 
DEFINITION AND PUREOSE OF EVALUATION . 
ie Decrees om Of Evaluation .« . . 


Ae 


QO ua 
e 


Ze OWEsOSe Gl Evaluacion 
PRCN Cte seOr EVALUATION . 
APPROACHES TO EVALUATION . 
1. The Systems Analysis Ap 


pLeacn 


2e The Behavioral-Objectives (Or 


RESEPSOCCH «6s 6 Mens. 6 


So) Die vecistom taking Apotoach . 


4. Tne Gcal-Free Approach 


Se Lion Obl icism Approach . . 


See umemercfkessioral Review (Accreditation) 


eOnerCiels ie as 6 =) 


7. (Tne Quasi-Legai (Advers 


Sommer nee Case Study (of [tansaction) 


Gee TY « =» 2 « « # « 
WHEN SLOwEVALUAT ES sie. came 
ipmeGencexXcusvalUation . .« 
Be Great BVealUation « « « 
5S. PwocessS EValuation. .« 
Cem Pe soauec Evaluation . . 
SURO es @ ss eel 8 


@ d e *® 
@ @ e e 
@ e @ d 
@ e @ @ 
e e @ @ 
@ @ e @ 
@ . ® @ 


Goal- 


azy) approach 


Approach 


10 
10 
10 
11 


om ~_ aaah ~~ 
NR fF 


aad at aca 
NA Ut FF W 


oak 
ON 


~) 


20 
20 
21 
Ze 
20 
23 
23 
24 
24 
25 





Iti. CePA UOICC MC Tes eM eis s) 6 <« « «© «6 © © «© «© «© « « « 26 
Le) Creel Gt sl e lle) 6 6 © « «© « « © «© « «© « « 20 
PCC UMMM enCEEe = te™ Us is i+ «© © « «© © © «© © © « « « 2 
Ce CCG CMGI Eco cls )lisls «© © «© 6 -c« « s « » 29 

Weeicmeeee ec EEE OE s . « s « © 2 6 © « « « « « 39 
Peet GG OMS (eee + «96 « * «© © + « « « 33 
Peru CON SOURCES ss « =« 6 6 e« « « « »© «© « « 34 
(Ree SUIS hate eEvaa a tORS 6 6 « « «© « 6 « 2 6 » 34 
PCO GM NCIMac OTS 5 6. «2 « «© « © « «© «© « « 39 
Deep Dictmeemecced Par wy EValLUatoOrs . « « « « « 39 
Dee US ORMOND cous cu. « 6 © ©. e:-6 © © « «© « «6 « 36 
on CO) Maes OMEN@ iO acto 6 6. « 6 «© ¢€ 6 © «6 « « « 3/ 


Pee OCIS or aee TON 2 = 6 «<< «6. © <2 <« « « « 3f 


Pee Viceloeeon EN OXGerRenGe @ a 5.6 = @ « «6 =» «= 38 
CPP Ien Me ee Pe, a, «6 el pe) Sos ce ee eC SS 


4. Evaluator Knowledge of Evaluation Purpose 38 
Ge PO mee ON SGM NOUS «6 + « + « « « « « 39 
ieee eviceintiece ind imadg fs 1s 6 « « « « s+ « « » 39 
Zee DeaMemonOnabennelysis <9. «14 « « 6 « « « « 40 


Se festang Evaluators . ...« 41 


i ReGucang SUD JECeivity Of Evaluation 
Mecapnlins = Suomen cules ste 9s “6 6 «© © «) 5 © o 6 4 1 
Fs SUMMARY @ es s @e es e@ e @ e e @ @ @ @ @ ® @ @ @ iw? 


ys PICO Nec sles Ue uel «© «© «© «© © ¢ @ « © « « « « « 44 
POSE One wes Colles « «© «+ 3 ss 0 «© »«@ « « « « « « 44 

Pen TP RGUGURU VE as) sl lef.e fete secs « «© © 6 «© «© « « « 45 

Ieee tee ubecimexerctcse Controller (TEC) . . « . 46 

PPA roolMiaiOMomeMiacon s G6 0 « « « 5 « s « « 46 
POG CN inne O DiGi 6 « « « « « 6s « «© « «© «© ce « 49 
(eC OreivalindceGr UhELUCRCe « < . « « »« « « 49 
POPUL Or BLaSeS 9. + « « « « « « « « 90 

Heetess Om Performance Standards ..... - 53 

Deon cia ERORLE MS PERCEIVED BY FIELD USERS . 54 





tote ewer SOLUTIONS . » « « « « 
Vata rOG ete ao ham ts « s ~« + » 
Pee Svea cOnmne St @AGemms 6 <. «© « -« 
Seeetateeeminccarton of MPS*S . .« « 

F. MONGiSOrco 6 s+ = «© «© «© © s © « 

Gece COMMENDATIONS FOR FUTURE RESEARCH 


fot OF REFERENCES CMs suts Cel «le a5, ¢ «© «» « 


Miri tAL CISTRIBUTION LiST ~. ~ « «6 «© « © © « «w 





best OF TABLES 


1A PUGMEMa Cc OGM Gis . SNSNMGINSEs 6 « 6 « © © «© © «@ @ 


ie MES nequirements Susceptible to Evaltiator Bias .. 55 





LIST OF FIGURES 


Deficiency and Contamination . 
Evaluatcr Disagreements .. . 
MCCRES Evaluation Structure . 





I. INTRODUCTION 


A. PUORPCSE 


The purpose cf this paper 1s *5 examine the Marine Corps 
Combat Readiness Evaluation System (MCCRES) to disccver if 
the system is susceptible to bias¢s which nay cause the 
results cf evaluations to inaccurately reflect the ccembat 
readiness of evaluated units. T9 guide research, <WO 


specific questions are posed: 


1. Can factcrs of the MCCRES evaluation which are 
subject to evaluator bias be ldentified? 
Oe aow Carn these factsrs be Comoe cl og “DE 


Senerorled Eon? 


EB. BACKGROUND 


The Marine Corps Combat Keadiness Evaluation System was 
designed tc provide timely and accurate information 
Bencerning the abilit SewOvezacwinig UbLG&S Of ene “Marine 
Corps, beth active and reserve, tc carry out assigned combat 
Missiens. The system uses “expert” avaluators from varicus 
speciaity areas to observe and grade simulated combat opera- 
tions. aggregating these evaluazticns provides an overall 
view cf a unit's readiness for combat, and feedback from the 
evaluation allows the unit commander to identify and correct 
Potentially problematic areas within his command. 

Though the MCCRES is relied upon as a standard agains* 
which units are judged, the readiness grade received could 
be mere dependent upon the evaluator than che actual task 
BeessormMance being graded. By controlling cr controlling for 
evaluatcr bias, a mpcre uniform standard by which to judge 


Combat teadiness can te realized. 


10 





C. SCOPE AND METHODCLOGY 


This thesis views the MCCRES as an informaticn system 
and explores areas where evaluator bias (input) can cause 
Mmeeangs (output) to reflect the evaluator's opinion rather 
than the mission perfermance of the evaluated unit. Two 


major tofics are researched: 
1. Evaluaticn--Its major approaches and principles 
Ze Evaluators--Their sources and typical errors 
These areas are related te the MCCRES and methods of 
Meme rolling or contzrcllirg for evaluator bias are develcned. 
The research consists of a detailed literature search in 
Mme area CE evaluaticn science. Methods for the reducticn or 


Men-ircl cf evaluatcr bias are explored for us@ in the 


{24 
(nh 


to met ca id he ee 
mero]=x. Chk tne@ MCCR 


an 





If. EVALUATION 

1o.S chapter addresses the evaluation process, 
presenting definitions, purposes and principnles of zvalua- 
eon , and explores some currently used apporoaches for 
conducting ¢valuations. The guestions of what to evaluate 
and when to evaluate ere aise investigated. 

The terms geal and objective are used szhroughout this 
and succeeding chapters. Objectives refer to long range 
Statements cf purpose within the organizaticn. They gener- 
ally can not be specifically stated and need not be attain- 
able in the immediate future. Alternatively, goals are nore 
Meaadbly attainable in the snort tun and ar 3 
stated. They can appear as written statements whi 
organizaticn's operations, and are a standard ag 


pericrimance can ce measured. 


A. DEFINITION AND PURPOSE OF EVALUATION 


1. Definizicn of Evaluation 
There are many definintions of the term evaluation. 


ect a single auther's dafinition, two cbse 


t- 


ather chan se 
vaticrs and two definitions of evaluation are presented here 
memsnocw Froth the similarities and differences encountered in 
the field c£ evaluation research. Bhese aderinait@ons and 
SXsetVaticns are given in order from simple to rigorcus. 
Peeuconsc,. Mone an ObSe=Vation tnan a definition, is 


from E.R. House: 


At its simplest, é¢valuation leads toa settled ofinion 

aia =<Gpeentng 1S the case. It does not necessarily lead 

Sema ceG2c216n 6 aGCt if a certain way, though today it 

mo Cr2=enh intendeq for that Pau ae Ve nos on leads 

ra 4 a ip ow about ZOLth. Of something. 
ef. il:p.1 


iz 





The. second observation about evaluation, in partic- 
ular the evaluation of a procéss, is that its scope "is 
confined to assessing what a particular program has accca- 
plished in meeting its immediate oojectives...," and 
assessing the "werkakility " of a program [Ref. 2 :p.11]. 

Henry W. Rieken'ts definition looks upon evaluation 
as " the measurement of desireable and undesireable conse- 
guences of an action that has been taken in order to forward 
some goal that we vaive." [Ref. 3 :p.54] 

Finally, the definition presented by Stufflebeam et 
mee 2S that "...evaluation is the process cf delineating, 


obtaining, WimraeyTding useful  intormatrenetor judoiz 


decisicn alternatives." [Ref. 4 3:p.40] 
There SHO ew se eCtors “"eonnen Somes eeh OL ahe 


nH gp PP 
4 
«) 


preceeding cbServaticns and definitions. PUGetg: Valuer son 
ls concerned with making a judgement or asse 

something. Second, that judagement can be made i 
some goal or objective. These two factor a 
basis fcr a definiticn of evaluation develoved i: 


eectich. 


@- Purrcse of Evaluation 
Wsing the akbcve descliptions of evaluation, the 
purpose cf evaluation can be examined. Sturflebeam et al., 


cated simrly that "The purpose of evaluation is not to 
Beove Sut +0 imvorove." (Ref. 4&4] Combinin this statement 
2th the ideas set ferth in defining evaluation, we may loox 
at evaluation as a judgement of something, Say a progr2n, 
W2ch the purpose of improving the current attainment of that 
program's gcals or oktjectives. This position, though, seems 
to make evaluation a method of program improvement rather 
than a tecl +o help achieve this end. The judgement made 
May indicate some action which should be taken to impreve 


the organization's goal attainment, but the judgement in and 


13 





of itself dees net cause the organization's goal attainment 
£O imprceve. AS such, the evaluatidn is a tcol for progran 
improvement. Evaluation as a tool for decision making is 
brought cut by Anderson and Ball. Their use of the phrase 
Smee CONtrabute te decisiors..." [Ref. S] in describing 
evaluation makes cl2arer the idea <zhat evaluation is a tcol 
rather than an end in itself. 


If the above purposes or evaluation are accepted, 
n 


then we may wish tc form a new definiti of evaluaticn. 


Mies definition takes into account evaluation's purpose. 
Miacregating the previously cited authors’ opinions anda defi- 
niticns we may lcok at evaluation as a judgemen*= of some 


Program with the furpose cf contributing to decisions 
ia 


1 


Cc 


) 
p 
t4 


s ’ ° ~ 
Pewee Cll eon. Seta lumens OL 2aat czogqram’s dqoals or 


a 


4 


- 


bjective 


O 
iD 
Q 
ct 


in 


B. PRINCIPLES OF EVALUATION 


Theré appsars tc be a general acknowledgement among 
authors cf evaluation literature that a group of principiaes 
exists which governs the conduct of evaluations. Tracey 
met. 6] listed six principles which may be found in various 
Bomm@s 29 the writings of other authors {Ref. 1, 4, 5, 8, 9]. 


Evaluaticn must: 


1. Be conducsed in tetms of purposes, thas is she 
cbhiecz2ve2S mest be known 


we fEe@cohe  Obneceayes are nox 
pale 


h 


cannot measure how weil 


t 


known, the evaluation effo 
they are being attained. 
Ze Be cooperative. Cocperation of all organiza- 
ticnal levels is essential. Without free communica- 
tion, evaluation results wiil not reach all parties, 
diluting their usefulness. 

Se Be continuous. Evaluation must be an on-going 


FrocesS *%0O accurately track performance and aid 


Poarhing in light of current oojective a*tainment. 


14 





ieee SUsen 52. Generalizations are not as useful 
as specific information in providing performance 
O 


ime Orla ca 





ct 
{- 
Q 
im 
ip 
(ae 
to 
Ira 
\2 
fon 
1a 
¢t 
Qo 
rj 
by 
@ 
(D 
<j 
re) 
- 
ep 
} 
nad 
a 
ry 


infcrmation cf sufficient guantity an 
tc evaluate not only the program outp 
mechanism of converting inputs to ou 
individuals’ performance within the méch 
om Be based on uniform and objective 
Standalas. Methods and standards which change fr 
one evaluaticn to the next destroy trust and leave 
these being evaluated questioning how they sheuld 
erferm their work tasks. (Ref. 6:p. 14-15] 


C. AEFPHRCACHES TO EVALUATION 


How dces one appreach or categorize evaluation? The 
Mmeerowing Seccion discusses eight approaches to or cateqo- 


Semen ee eto s eh eS | 


2 
ties cf evaluwaticn fcrwarded by 


1O 
inr 


uc 
1. The Systems Analysis Approa 


The systems analysis approach defines a small number 
cf output measures and attempts to relate differences in 
programs tc variations observed in the variables. The data 
acquired thrcugh this observation is quantitative in nature. 
Corrélaticnal analysis or other statistical methods are used 
to relate the output mnéasures to z=he preoqrams being evalu- 
ated. This method is widely used in the Department of 
Health, Education and Welfare in evaluating federal sccial 
welfare programs. 


tn 


An example Mi OLrLGeunO. Economic Opportunity 
(OEO) evaluation of the eighborhood Health Center (NHC) 


progran. The O£O defined five areas of interest to be 


15 





anvestigated in determining the impact of the NHC's. These 


areas of interest were: 
leueeotGeesseer the WHC's in previding comprehen- 
Sive hea 


* 
t 


€ 
iaieeceat e tO ths) poor . 
ioe rea Ctieohmeomtne Cates ~eceived at “he 


oe Degree of implementation of comprehensive 


and continuous family care at the NHC's. 


4. Functional and organizational comparison of 
the NHC’ s. 
52 Antipoverty consequences of NHC services. 


fRef. 7Jsp.107-121] 
The NHC pregram was evaluated according to the atta 
me, CLiectives which relate to the five specified intere 
areas. 

Cne problem which may be seen *w 
ensuring the cutput measures selected truly reflect the 
OIganizaticn's goals. If the selec measures do net accu- 
rately reflect those gcoais, the outcome of this appreach may 


te of lirited use. 


2. the Behavioral-Objectives (Oz Goal-Based) Approach 





iis ap psecaien,  popiwlatized in business and govern- 
ment crganizations as management by objectives, uses the 
Stated qcals of a program as the outpdut measure and evalu- 
ates fFrograp success by the atcainmenz of these goals. It 
can be s¢en that this method of evaluation addresses only 
the issue cf program effectiveness, providing no information 
Seepregram efficiency. In this sens3, effectiveness isa 
measure cf the extent to which an organization's objectives 
are achieved. Efficiency tefers to the cost of converting 
PeogtaMm inputs tO OUtfUuts, <hat is, the cost of okjective 
achievement. An early advocate of this behavioral-objective 
apprcach was Tyler {Ref. 8] who advanced this method for 


evaluating educationai goals in terms of student behaviors. 


16 





Fercer F. Drucker popularized the 
a 


ectives" in his pocck The DP 


i 


J en 

plementation of management Dy objectives (MBO) SOnCSS 
individuals and organizations to de é 
mespensibilit in terms of measureabdle expected results 
called objectives. Pe-formance is determined by ccmparin 
objective attainment against the sobjactives stated. Th 
popularity of the approach can be seen in its widespread 
use. A 1976 study showed 41 erecent of the hospitals 
Surveyec ust¢d MEO and ancthér 33 percent were planning <to 
start in the near future (Ref. 10:sp.8-11 ]j. MBO is used not 
ieyeaS 22 evaluation approach, but as @ means of olanning, 
Seebeameticn, communicaticn and control. An advantage is the 
explicit statement cf objectives which let workers know 
“heir specific duties and encourages communication tétween 


workers and supervisors relating to job performance. A 
Major disadvantage is the problem of specifying behaviors 
rather than performance. SHeGceriCG “Gbicce2Ves ars V¥ 
measureable, but behaviors are not necessarily mneasurea 
mumeee  CCHTeXxt Of ccntributiang <o goal attainment. aq 
Beene 12p.487] argued chat *...acting with purpose..." 

not equivaient to "...taking means to a well defined end." 
Memocr ner words, chough a spec d behavior may be observed, 
1= does net follow that this behavior leads tc a desired 


objective. 


3. 2b 


kw 
Is 
Q 
j- 
(n 
}-- 
O 
ry 
| ‘ 
=x 
ry) 
~~ 
a 
i 
to 
lo 
wy 
Q 


As an earlier definition of e¢valuation implied, 
evaluation is closely related to decision-making. The 
decision-making apprcach holds that an evaluation is struc- 
tured according to the decisions which must be nade. Tt 
assumes that the decision-mnaker's concerns are the signifi- 
cant areas the evaluation must address. By structuring the 


€valuaticn in this manner, the results should be of greater 


17 





use to the decision-maker. This approach relies heavily o 


pe) 


survey methcds such as interviews and questionnaires. 
Stufflebeam et al. [ Ref. 4}, whose previcusiv cited 
Meeentcicn of evaluation includes the idea that evaluation 
TeemeencOnMaceon LOD JueGing decision ,aite-netives, 

ate of this approach in ta € 

ation is structured with respect to the decision-~ 
Makers' concezrs and position in the organization, and 
ecific evaluation subtasks are identified and assigned. 
The results of these subtaskS are aggreg 
fo) 


in the decision 


g 

a 

ted and communi- 
cated to the decision-maker in order t¢t d 
h 


process. {[{Ref. 4] This approach relieves = valuatcr from 
Mmayeng *C guess the eudience of ths evaluation, thereby 
Eroviding structure for the entire aevaluati 

Mepes hand, <tnis apgeroach assumes thet che dacision makar's 
Meals are the same as those of the entire organizaticn, 


which may cr may not ke the case. 


4, The Gal-Free Appzoach 


Eacn c& the creviously discuss2d approaches involved 
program évaluaticn in terms of program goals and specific 
goals for the evaluation. The goal-free approach se2ks “o 
conduct ¢valuation in terms of program goals withcut scefer- 
ence to the goals for the evaluation, indeed, the evaluato 
is purposelv kept unaware of these goals so as net to be 
bkiased by then. 

Ser2 ven @apref. 17], a leading proponent of this 
eenoOl Of thought, feels that the goal-free approach is 2 


valid methced of reducing bias in evaluation, since knowledge 


or €vaiuation goals can influence the evaluator. For 
exaaple, an evaluator who is tasked with conducting a 


pertcrmance evaluaticn of an employee with the explicit 
intent of determining whether the employee should be termi- 


mated may deliver a different evaluation if the intenc is 


18 





+ 


not stated. In the fcrmer instance, evaluator knowledge tna 
his evaluation may result ina worker losing his jecb may 
bias the cutcome of the evaluation. By being unaware of the 
evaluation intent, the latter situation may result in 2 nere 
accurat¢ representaticn of the worker's performanc?. 

This approach is widely used in the area of consumer 
product évaluaticns. Various consumer organizaticns ragu- 
larly evaluates products placed in the market without know- 
ledge of the manufacturers goals. These evaluations stress 
standards and criteria which they (the consumer oraaniza- 
tion) fsel are beneficial to the consumer. One main preblen 
Bemove-ccte in this approach is the choice of evaluators. 
Scriven [Ref. t1] sees evaluators 25 sexperts, able to eliini- 


Macs and oreavent beth self-bias and bias of others fron 


impacting on the evaluation. MB Variety OL techniques, such 
as ccd2s cf ethics Sry couble-biecrdqd experiment sy are 
available tc assist the evaluator in eliminating bias. 


5. dhs Art 


This approach relies upon the critic to make judge- 
ment cn a program much the same way an ar* critic would 
meee a £1INe€ painting. Though opinions on specific details 
May vary, there is generally a consensus among critics of a 
certain endeavor as to what constitutes a notable work. This 
implies an extensive base of common knowledge among those 
mere bic to conduct such criticisn. 


Eisner makes a distinction batwesn connoisseurship 


ema CLiticisn. While connoisseurship is “recegnizing and 
Mec ctatang ~he qualities of the particular" it requires no 


public disclosure or judgement. Criticism necessarily encca- 
Besses conrncisseurship. "Criticism is the art of discicsing 
the qualities of events cr objécts that connoisseurship 
Perceives." [Ref. 12 :p.197] 

The key purpcese of criticism is to increase aware- 


ness cf a subject area an convey judgements in terms of 


ABS, 





ta which are accepted among chose knowlesdgeanple in 
rea. It ailows the uninitiated to gain an apvreciation 
at area through the critic's knowledge. Though géener- 
asscciated with art, iterature and other basically 
ive areas, the are criticism approach to evaluation has 
applied to the field of education with some success. 

A key problem with this approach is genexzating 
Meeetrca.ce cf the critic's criteria for judging & pregram. A 
critic may possess extensive knowledge in his field, but i 
the audience of his evaluaton is not receptive, his criti- 


Mem 1S met likely ¢c¢ cazry much weight. 


6. he Protessicnal Review (Accreditation) Aporoach 
me2 eee =SoSsmenel LFevView approach nas come distinc: 


Pare aliels with the art-criticism aproach immediately atove. 


Cc 
Professional review relies upon 2xpart opinion concarning 
generally a dards cf performance in evaluating a 


st 
pazticular area. The standards nere, though, are usually 
+ 


Mee- easily quantified, 21 eo ce ele S=syc=nred 
BeeP]ecach in the svaluation. Professional review alse is apt 


to usé many members, craanized as an accreditation or review 
board te conduct th2 evaluation. Standards and measurement 
Criteria are determined by th? proressionals themselves as 
meey al>e accepted as the experts in their fields. This 
aporoach freduces an evaluation of professionals bv profes- 
Sionals and its outccmes ar not 2asily influenced by the 


layman. 


Cia ae gcum na weasal Y)eApprOd ch 


— 2 2b 2 2 a ae = ae 


Cne of the lcng standing approaches for evaluating 
and policy-making is the quasi-legal appreach. It is an 
Bepecach te evaluation which closely imitates legal 
procedures. Information, or ‘tevidenc2', concerning a program 


is cktained from ‘witnesses', much as testimony is received 


20 





mea court of law. PNremiacrOnmDOrn Lor and against 


a 
MmececULak program iS presented, and gréat care is exercised 
ba 


memensurte that all rferti 


SPEme INLOZMae2Oon LS received after 


Th 


re 


which a panel of evaluators weighs the a2vidancs heard and 
Can reach adecisicn as to the worth of the program. 
Examples of this approach abound in todav's government, 
Bemeeng £rcm Local school board decisiors on grade schcol 
See icula through ee ey ise panels like the 
Warren Ccmmission which investigat she assasSinaticn of 


President Kennedy. 


This approach does not rely only on expert evalua- 

tors as have several previcus approaches. Addzz~uonally i 
Beecnily accep=s but enccurages personal bias and cpiricn in 
Be DoC7L cing information. As Holr= notes: 

The ultimate evidence which i425 deliberation and 

eeeecnt 1c lices | POLeOtyeeci> “acc! , Dut a wids 

Variety of per c2pticns, SOpinicens, biases, and specula- 

=OnS, Jimeno oe Pie e GOR EXT Of VYaiues azd belists. 

meef. 13:7.21] 


Miemaici mate goal of this approach is to reach a definite 
nm cn some issue. Its conclusions 
Meres, such as 'Is the prodqram meeting its g 


than matters of degree, as 'To what extent are 


4 
= 


foe) 
e 
A 


Cas 


aa = 


im 
im 
ic 
Ter, 
ke 
ig 
r{ 
To 
| 
7 
(fu 
ct 
{- 
O 
2 
ot) 
oO 
i 
O 
ey) 
Q 
ey 


This appreach is widely used and accepted in organi- 
zaticnal studies. It focus2?S on program processes and 
Mreeetdctions, both within and outside the program, with the 
intent cf giving the reader of the case study a greater 
appreciation of the ovrogram's workings. This approach 
commcniy presents interviews with people in the program and 
observations made by the interviewer a* the program site in 


the f£f£crm cf a case. The case can be 2xamined by evaluators 


21 


pa «6Ccenclusicns 
ideas among the 
sions are aimed 


knowledge of 


his/her understanding by illustrating 
program being evaluatred. 
more fully understand 


and new fregran 


A major 
confidentiality 
was kased. 


disgquisin all 
Anotner 
Eeacly <be 


large case 


che evéeéluation area 


Problem iwnach 
ozéeat 


study 


reached through discussions and sharing of 


evaluators. The case study and its conclu- 


at the reader who does not jréat 


possess a 


aS a means of increasing 


how others view the 


This approach allows the reader to 


the internal workings cf the progzam 


Iniple sad co corVettod =O ous puts. 


probiem with this approach can be énsuring 


for the members upon which tne 


Case study authors méy 


GE 5 2ne personalities involved in 


may be 


SECO. es. =a 2— representing 
eo 4S Cc 


diversity of a 


Mayeencaze. A complicated case with 


perscnal interactions can require a tremendous editcriai 
Seeeort ~C ensure that it 2s accurats and understandable. 


9. Summary 


The above 
C 


Sessoacncs ar Gertazaly act 


Sive, her can all approaches to evaluation he 

Gat intco «hese eight categories. They are intended to show 
<he Variaty cf approaches available 35 Sonaucr ong 
eValuaticns. Though the overail purpose cf evaluation may be 
Meemcame, chat 1S DtOViding informaticn to 21d in decision 
making, diztferent Situations may cail for aiffterent 
approach¢s to provide necessary information. The eight 
@epreoaches show that techniques can be chosen to Fit 
evaluaticn to ¢valuater skill (quasi-legal vs. professional 


review aprrceacnes), 
behavioral-objectives 


evaluaticn cbhjectives 


program objectives (system analysis vs. 


approaches), or even ZOu UOlos> 


(G@Gal—crse apploach) ; 


Zz 





D. WHEN TO EVALUATE 


Stufflekrean et al. (Ref. 4] orovide a view of Evaiua- 
tion which investigates when in the vorogram life 
Baeach 2s tc take place. They have defined fo 
evaluaticn--context, Season process, an 
evaluaticn--whic serve functions from op m inception 

W 


through final impact on the system in 


=| 


Deeg iran 


- 
Cperates. Each evaluation type is expiained briefiy below. 


Context evaluetion is uset in tne planning orecess 
=e ee ee PNG Paes owekS Ce u..uSe-. Cppr.- 
Bie ccs and Gehan 2g Sen lens sal choprevyenr the qoals 
MES ee Che SDC ELE” Wes ESfa | 6ObSeing used. Phisz 


acc eaeeof Oc eg ian 


sh sk 7 
=o=c,f = Ads 


— 2 


c 
= a i 
ves e witch a@5e used as watdsecicks against which 
am pé¢rrformance is neasuréed. Su 
Pole medemeaay twO Mocss Or Cc 

n and congruence. [ne contingency mode i0cKxs 
S@es2d> sne system tcr factors which may y 


Meter. Ilytically, if-*then type gues 
= 


W sf 
OeGests CCReinue EO rise, =hen is our oresant budce+ 
Meeaquass? Congruence mode is a con 
Beewtal Se-fcrmance. This no 
Pesegcal at~a4lamert. As cop 
Bees mcde iocks only «it 


Beevice svaluation data. 


Patiala sich 2s conce= ted with tne use of a 


aole rescurces PeemoOEdom= ne IJojecetves =Eo>muli 
Mee oeX- sValue=ion. It is useful ina o 


wes 





OEE = ae ne PES Gica, S..i 1tS oOurtEut Can 32 
Semepated +O &@ Cost/benefit analysis witn rescurc= us 
+he cost and goal attainment as the benafit. Besides or 
Peete tU-ing, icrut evaluation also h 

lems as the need for additional resources 


strategic d¢cisicns. 


3. FErocsss Evaluation 


La Gl 4 Ee Gc 


Erecess evaluation begins after program approval and 
implementation. Precess evaluation analyzes the pre 
rrocess as it 15 op¢rating to provide information on whe 
the trocess is working as désianed. Stufftlebeam ¢ 
ere 4} COIlNt out that this type of evaluation is paxtic 
Marly — early in program implemantation, when firm 
Mpeone .ntormation is not yet available. Sees Peru 
Meg@admazecscn tO measure hcw well it is carrying cut the 


Peocram triar. 


EnOGuUes —§6Vallat Ons @esOViies inronmation on qcal 
attainment, how well the stated objectives are net iz mic 4 
fegor i2nku: to decisions which would modify «he prcgran 
after implementation. 

The view previded by Stufflebeam et al. {Ref. 4] 
snould nct be regarded as an Svaluation aporoach different 
from those listed by House [Ref. 1], but 4S an expansicn of 
thos€ approaches. Each of the eight approaches cculd be 

memtercu=cG tO LOCK Specifically at input, context, oprecess 
Seeeoutput though, @€as implied earlier, the different 
approaches may not be equally effective in providing infor- 
mation in these four areas. The Sturflebeam et al. view can 
be seen as helping determine thea timing of evaluations, 
uSing cne cf House's approaches, Pomp ECTude lt ArOrmati cn on 


Beeict fic FOLtLOnSs Of a program's life-cycle. 


24 





Ee. SUMMARY 


This chapter has focused on =he man idéas and 
apprcaches available in evaluation science. Definitions cf 
evaluation and its purposes were presented to show the sini- 
Jarities and differences that exist among authors of eva 
tion literature and adefinition of evaluation was formed. 
The definiticn lcokad upon evaluation as a judgement of 
Peevram With the purpose of contributing +o décisicns 
concerning «he current attainment of that program's geals or 
objectives. SG EnineGs (cS EOr evaluation wert2 also 
presented, demonstrating new and when evaluation should be 
conduczed and what kind of informaticn should be provided by 


eae evaluaticn. 


Bera ing eight apprcaches which ars availabl 

These arrroaches provide different evaluation structure 
feeding Ch “ne type of information desired from the evaiu- 
Seeon Or the dirtferent evaluation assets availaole. 
Begally, aview of avaluation which addrasses whan ‘to 
perform evaluation wes addedi “> the eight evaluation 
apprcaches. 

Wer 


*his grounding in the fundamental ideas of evalua- 

h ext chapter will focus cn the evaluator's roles 

and responsibilities, and some propnlems associated with 
Gi 


she abcve 


th 


The evaluatcr's implementation o 
rinciples and methceds can greatly influence the eventual 


Cutccme cf the evaluation. 


Zo 





III. EVALUATORS 


The ideal rater who observes. and evaluates whet is 
moapercant and reperts his judgement without bias of 
appreciable error Bes NOt 2X42S5t, or if he does, ws 
gon't know pos to separate him from his less effective 
colleagues { Ref. 142 p.7 ] 

Though the above statement may be true, many steps have 
been taken in evaluation science to identify competenz eval- 
Wators and improve performance cf eavaluators in general. 
Mme Chapter lockS at the ?valuator, beginning with a 
Meosecussion cf objectivity and validity as they relate to 
evaluation. Whe periorms evaluations and whether they ccme 
Pee eS SS | UO SidSs the oBganizaticn is investigated, 


meen advantag 


ip 
() 
%) 
fa. SY 
Qn 


mon SCcurce. A 4a: 
typically make is presented along with sources which aay 
meee these Cr=-ors. The chapter closes with a discussicn of 
seve2ral methcds for reducing 

femeeong ints their svaluations, ranging frem training ths 
eyauatcr *C IMEroVvin the tools tne svaluator uses in 


perrorming evaiuation. 


A. OEJECTIVITY 


= 


DHISCtTLVIty, Poeene COhTSx= or Value cion, is she 
ability tc ecbserve scmething — aoe. “phySically exists 


Without =he inclusion of personal feelings about the cbject. 
For examrle, the statement ‘Joe is six feet tall' would be 
considered mcre cbjective than saying ‘Joe is a giant'. The 
former cculd bs adequately demonstrated using a tape 
measure, while the latter is largely dependert upon the 
particular observer's concept of what is giant and what is 


mot. AS Hcuse points cut: 


72) 





' 


Objectivity is often equated with agreement am 
vets. Agreement 1s accomplished by having ext 
eee Opec =dUure- FOr OPSSzVe-iOn. Sy this 

Qoojectivity is achieved by having observers 
what st see--replication ohn obs: 
Muctes) 130.215 5 


suid O 
DM Wri Ds 
11.Q FhogwWQ 


<—ott beg 
tym 


HD ad 


cb be -beery 
730 0 fh 
He ee 


VM Do eO 


{° 
Oo 
t 


\ 


pemee Calls this tke quantitative notion of objectbvity. 
The concept or reliability in observation clesely paralle2is 
this avantitative nection. Releebility 1S based on the 
peat y <tc ztéeplicate chservations. That is, if a particular 
cbservaticn cf an object can be replicated, chat observation 


is assumed to be reliable. 


Ee VALIDITY 


t-4 
th 


MeEmeeceeO 7 ee ett li-Y iS Gmportant <c 2Vait@tlicn. 
Rewmdeeles eo y "let leet he qualities of 
to 


Meaciece a. * ==ys° cvaluation. of thax 


Beemeagual i cacti ive sense cf ch jectiyity. Sepa rtues = hae jas ken 
in the ¢xtrens, wucmeGiaic=scamlVve AMOalCh Of ODD tECTiVicy 
Semgeusces the method cf verif: 


= 9 c 
vaticn may te widely agreed up 
closely dces 1i* represent real ood' is the obser- 
ewer =i tine. aen= Cl» a 
Vv e 


television receiver éevaluatcr ob 


ow 
t4 


y? 
meeecr?: To illustrate, Scriven 
se SEGia 2c yeu 1ne 


= 

e¢valuatcr used a mechanical davicée to neasure decibel gain 
See che receivers, SOuGQme@s=Ge Was latele corselation 
Beeweey decibel gain and picture guality. The observations 
cbtained were able <tc be replica=ed and the results widely 
agreed upon but they did not really relate <tc picture 
quality. In this case, the evaluation was quantitatively 
Ref. 15] The issue of 


@epaaekle but lacked quality. [ 
evaluation quality is commonly referred to as validity. 


a] 





As a me+hod of relating observations to objects we wish 
+o ¢valuate, Cummings and Schwab [Ref. 16] suggest e 


h 
Bemcerc C£ construct validity. A construct is a mentel image 
g 


we have of something, thé way we perceive sonathing. 
Mees agity, in =his context, refers to the corralation between 
Our mental image and some measure of it. In the previcus 
example, «here was little correlation between decite]l gain 
of the television receivers and quality of the picture hence 
there was little construct validicy. A different measure 
which mere closely corresponds *o our mental image of 


Meee -ULe Quality could be chosen. The clo 

chosen corresponds with our mental image of something, “he 
Peete GOnetruc. Validity. A dif 

Viewer satisfaction will have varying degrees of censtruct 


Mme r fs accotding t¢ how closely Lt compares with ouc 


D 


Mecca) image of picttre qualiczy. 


ct 


Memoetcer LilusSttare che conespt Of construct validicy, 


Bemsltas= Figure 3.1. Neeohown, ewe. Left Circle tepre sent 


(i) 


"n 


some construct we are interested in and the right circle 
represents some asasure of that construct. Ideally, there 
meld d€ Ccmplete cverlap cr thea circles representing a 
total ccrrelation between the construct and the measure 
used. There are two general reasoas that the two circles do 
BOt ccompletaly cvarlap--measurement det C2 ency and 
measurement contamination ({ Ref. 16]. 


~ 


Measurement Geficiency occurs when the aeasur2 fails to 


Meese santc account all of the factors resent in our 
Bemsesuct. FOr example, a measure or a iat ecce =sing 
Geapartment's performance which acccuntesd for quantity of 
Cutput but negiected quality and timeliness would probably 
be ccrsidered deficient. 

Measuremen+ contamination, eaCOncrase: tO Measuremnen* 
Gees ciency, occurs when the measure takes into account 
meeeonrS witch fall outsides cur construct. If our measure of 








—_—---- =— 






CHEST LUCE 


Miepeed HeOom | S2fs17 ¢p.75 | 


[yy yy ey OE ee! 





eae fekmicyy See | Deficiency and Contamination. 


Meerda=s cChccessing decattment's performance includes items 
Such as ccrporate sales or top 
the déevartment, “ne rea 

meemay be ceen thet beth deftzcrency and ccntamination in 
Measurement Ode aC 2 oo Sct Cie) s) ee Bloke 


O 
Meeedicey. If cur measures 4o notc ie 1] fecr ors 
eS 


) 
4 
in 
ct 
tt 
& 
ot 
= 
O 
] 
f+ 
tt 
ut 
an 
M 
= 


a 
Bemecagnent= to cur = 

fmerors CutsidS our construct, it :s unlikely that the 
méasures will accurately reflect thea a l image cf the 
Senmetruct. Both of these circumstances, «her, decrease 


Semocruc= validity. 


Gee ERRORS 


Ther2 are a number of errors which esvaluatecrs may commit 
during the evaluation process. Cummings and Schwab [Ref. 16] 
discuss these errors in two main groups- variable error and 
constant errer. These two groups are explained below, with 


examples 


29 





a 
e@ 
tt 
im 
te} 
$4. 
{S- 
ft 
ty 
ti 
r4 
4 
Ta) 
4 


Variable errcr is evaluator disagreement which mani- 
fests itself as differences in the scores of specific items 
ee an evaluation. It may take two forms--disagreements 


between evaluators and disagzrecments over time. 
a. Disagreements between evaluators 


Suppose two evaluators, A and B, have cbserved 


five werkers performing their jobs and rated the workers! 


performances on @ scale of 0 (peor performance) Oe On oee 
BecrCtmance). The ratings are shown in Tabla I. Note that 
there is tetal rating agreement only on worker 4 and the 
Semel cratings dicfer from 1 to 4 units. 


TABLE I 
Evaluator Ratings 


BEMEENGS 
WCHOKERS EyabUnaren 4 EvVeHbUalOn & 
1 5 3 
my 7 8 
3 3 7 
4 9 g 
5 4 0 


Taking the ratings obtained from A and B, we now 

Memeo FIOt the sccres, with evaluator A's rating repre- 
mewe.ng ~ne X-compcenent of our plot and evaluator 8's 
St. OLteucne pice. The 


Meeuit iS a graph as shown in Figure 3.2. The straigh® line 


Tetings representing the Y-compon 


eeeenaging from the crigin and rising from left tc fight 


represents total ainda between the evaluators. The 
Gistance of each worker score from the total agreement 
line is a measure of the disagreement between the 
evaluators. A [emieds CoL_elazc zon § coefficien= may be 


calculated which expresses the amcunt of agreement between 


50 





the evaluatcrs. Values for the linear correlator coefficient 
may vary from -1.0 (highly negative correlation, meaning 
that high values for the X-component ctend to go with low 


values for *he Y-componéent and low values £c> <ne 


O 


X-component tend to ge with high values for the Y-component) 
Boe +1.0 (highly positive correlation, meaning that high 
values fer the X-comronent tend to go with high values for 
the Y-ccmponent and low valives for the X-component tend to 
go with low values for the Y-component), with a value of 0.0 
mogecasing ne correletion (no predictable pattern). In this 
eeample, the linesar correlation ccerficient is 0.6 indi- 
cating scm positive correlation between ¢valuators 


A 
Meals in the range cf 9.8 to 0.9 would tend to indicate a 
a 


Pa Basen. Lt Simoly Shows th 


my 
<! 


u 
n what they have observed. Both A and 5 om 


ES 


ke wrong in their ratings of werkar 4, but their agreement 


5 
Beuld provide some ccnficence that theiz rating was correct. 

Two methiceds Se Chaos s reduce disaareement 
bFetweer ¢evaluators are reduction oF Slimination cf subjec~ 
tivity in measurement instruments and ensuring evaluator 
eames tasity wath the job baling evaluated. wfhe former method 
reduces disagreements by relieving the evaluator cof inter- 
preting subjective measures. Ey using more objective evalua- 
TLON measures, eVellacos — hias 1s less likely te be 
weemaetoalily intsoduced [rRef. 20 :p.46]. Ensuring evaluator 
familiarity with the job being evaluated increas|as *he like- 
Menocd Of evaluating jcb tactozs which correlate highly with 


job perfcrmance. 
bk. Disagreements Over Time 


Dlisaqreements over time pertain to disagreements 
in @vaiuaticns made by one evaluator at different points in 


*im2. Suppose that, in the example of disagreements between 


S 








\e) 
EVALUATOR A RATINGS 


{ 104 
{ 
{ I | 
9 { uy 
i 
Bee | ; | 
i VY ef X2 
| A { 
i | 
{ yo X3 i 
; | | 
» f° : 
R { 
{ { 
{ Jah Sara 
{ | 
i | 
| 4 4 ) 
i (es ( 
ee | 
N x1 
| G 
ee. 
z | 
{ i 
{ | | 
{ Tf | 
| | 
{ { { 
{ OL rt © GS eS ee ean 
i ime 2 3 Coun > ie — 5 io aay 8 | 
| | 
| 
| 
J 





re 


yale etsy 81a Evaluator Disagreements. 
6évaluators, avaluatcr A’s ratings represented an evaluation 
pertermed by A at time 1 and that evaluator B's ratings 
Tepresented an evaluation performed by A t tine 2. 
eeulacion of the linear correlation ccefficiernt would then 
measure hcew well evaluator A's ratings agree over time. 
USing disagreenents over time aS a neasure of 
construct vaildity is generally not as desireable as using 
disagreements between evaluators. The reason fer ‘this is 
tha=~ differences in evaluations mada at different pcints in 
time may be due to performance improvament or degradation of 


Seose being evaluated. The low correlation coefficient 


32 





obtained from a cecmparison of evaluations madé on a werker 
whose perfcrmance has changed markedly cver time may be 
Miastakenly taken to mean the construct is not valid. for 
Mies feaccn, Correlation coefficients obtained by comparing 
two cr more evaluatcrs' ratings ars a better measure of 
Gonstruct validity {f{Ref. 16}. A method of reducing disa- 
greements over time, discussed latar, is testing petantial 
evaluatcrs and choosing those who demonstrates little of this 


SeLOl. 


€ variaktle ¢ezrors <tend to create differences 
a0) mee te i 

Saeral tendency and leniency. 

ae Halc Srecr 


fa OmmerGECE OCCUlS “When an evaluator fazlis <to 


Peet ccsntiate amcng individuel items or dinensions in his 
eavaluaticn, but evalwates on the basis of his overall 
impressicn. The boss who observes only an employee's written 


acess the employee high in areas such aS initiative 
and rerscnal relations has made a halo error. 
beeeG=ntaal tendsncy 


Central tendency is the tendency fcr evaluatcrs 


ct 


0 


tf 


iD 


ali dimensicns of an object neaz the middle of the 


d 


S\ a8 
u 


(D 
ct 


aticn scale, avciding the exiremes. 


t- 


Va 


¢ 


c. Leniency 


This errer is cemmitted when an evaluatcr tends 
to rate all cbjects tco high. The ‘easy grader’ ccnsistently 
Mea vers inflated rating marks. The opposite error, that of 


Beng alt chjects tcc low is called strictness. 


33 





Evaluator training in the area of constant erro 
is a useful technique in reducing =hese errors. A discussion 
ci this techniques is rresented in a later section. 

De. EVALUATION SOURCES 


Evaluatcrs may cceme from many places within and outside 
S 


Secor ganiza*ion. Though evaluations by superiors are very 
common, alternative sources of evaluation exist--peer, 
subordinate, self and disinterested Demy OG Outside 
evaluatcrs. 


eee supe si Or Evaluators 


SE SR a a A A a A eo 


Evaluations Ly superiors are a widely used methed in 


eeaey'S Organizations. SUPEGEZ@s>s are chesen fer many 
reascns, such as jok experience, faniliarity with subordi- 
Mas] positions and jor skills, even traditicn. Superiors are 


aeems, fOr Pehex=> POSit=a ton 


7) 
‘ 


v 
Seeen ~he lcgqical ch¢cice as eval 


moc ne OLrgenizational hierarchy is such that they determine 


Oo a great extent the incentive and ceward system for their 
Subordinates. Ss SUCh meee Nésceecyva lua tions cil subordinats 


qd Ss 
= punishment without ovassing 


may lead to direct reward o 
through ancther level of hierarchy and this immediate 
evaluacticn-incentive +t1¢ keeps subordinates appraised of 


their performance. 


Scme problems can exist with supervisor evaluations. 
a 


07) 


u 
FPlrst, if the subordinat? being rated does not work directly 
for the evaluating superior or if ther2 is substantial chys- 
ical separation of the supervisor from the subordinates, 
Superviscr cbhservaticn of the subordinate's job performance 
may te limited. Alsc, due +0 rapidly changing technaclogy, 
the sup¢ericr may not have enough understanding of the subor- 
dinate'’s actual on-the~jcob responsibilities *o adequately 


rate his performance. Increasing automation in the workplace 


34 





fends «=o widen the ‘understanding gap for the suvéerisr who 
does not strive +o stay current in today's dynamic business 
world. 
2. ser Evaluarces 
Feer evaluatcrs are those individuals whe work at 
the same organizaticnal ievel as tne person rated. Many 


erganizaticns avoid using peer eévailuations, dismissing the 


technique as a NGcunlaraty (Goncest’. Peer evaluator- 


#7) 
om 
ed 
ev 
W—) 
] te 
re) 
Q 
ct 
eo 
{p 


evaluatee friendship is sééen 4 Talmde=y of this 


technique. This may be due to the per tion that friends 

tend tc minimize or cverlook one anoth 

only elevate good peints, or mista 
Gas Df hign jeb pee tcre@erce. Recenr 

Beadies (e.g. Kliimcski and London f 

fRef. 18] ) kave shew sh 

Significantly affected by 

a 


Sescums <= ences, vec lL -e¥ 


Disinterested parties can possibly be obtained 
meen oenm the organizaticn or outside. They may come frem any 
Crganizaticnal ievel so ilcng és they have zo vested interest 
in the outcome of their ¢valuations. Some organizations 
meer g i9 Cucsiders tc perform this function, feed2ng enact 
Mmeeneece Personal contacts within che organization wiil allow 
amore ob tective evaluation. 

A problem which may cccur with disinterested party 
evaluatcrs is that, aside from having no vested interest in 
the evaluation outcome, they may also have limited insight 
into the factors which indicate good job performance. As 
noted in supervisor evaluation, the evaluator who does nor 


Seay GUITENt On the the technology of the workplace is not 


JO 





likely to déliver as good a performance evaluation as = 


( 


who is mcre familiar Mean thac ~teennmolcgqy. Ene addition, 
cutsiders brought into pertcrm evaluations may net fully 
Geaec factcrs such as organizational politics and intzrper- 
sonal reiationships which can greatly influences overail job 


perrormance. 


fee DL SCUSSTON 


ation source has unigue characteristics, as 
well as similarities with each of the other sources, in 
providing eval ple ire HACUgGh) Lecsoate= on OT 


comparable for superior and p¢ar 


aa 9 


évaluatcr errors is jvgagall 
evaluations [Ref. 19] udies have shown that rating 
sources I tes 10 wie. co oN =o pao rao asa 


3 
Rone oi/ |. This difference in 2EGCe 37 LoOns 25 celated to 
Pp 


dimensicnality. 


(f 


Dimensicralitz is the quality of an evaluation area 
possessing aifferent elenerts or dimen: 

S2eo2> =xamined the broad area OF S2#Ccretariai job ferfora 
ance, many individual dimensions could be identified, such 
femeeweirg Screed, tH ping accuracy, shorthand ability, o 
Been, ability to speak effectively on the — and 
Many cthers. These dimensions comprise the evaluation area 
Saeed Ssecretatial jck overformance. 

Net all evaluaticn sources use the same set of dimen- 
Sions in conducting evaluations. As an 2axampl2e, consider an 
evaluaticn cf werker perfcrmance performed by a worker's 
Superior anja peer. The superior, being very goal orisnted, 
rates the werker's clerical performance according +o how 
Many pages are typed fer hour assuming, perhaps incorrectly, 
thaz guantity of pagées typed also indicates quality. The 
peer, who must correct any errors mad? by the worker, is 
concerned with quality of output. Different sources exhikit 


different perceptions of performance. Neither view is 


36 





Pececcarily wrong, but this iilustrates the differences chat 
may exist between evaluaticn sources. Holzburg [Ref. 19] has 
found a censistent outcome cf dimansional analysis of supe- 
rior and feér evaluaticns is that avaluation sources deéeter- 
Mine the primary dimensionalizy of the e¢valuations. What 
this means to the evaluatee is that performance rad 
received may be due more to the evaluation source than the 
job perticrmance. 

The following sections discuss seme of th2 errer sources 
which may cause evaluatecrs to commit errors and methods of 
Beaucing various €rrors 22 provide more accurate 


Sua Uuaticns. 


fee £RROR SOURCES 


Samy Gac.OLS COncribute to evaluator error. Though citer 


grouced under the general heading of bias, specific factcrs 


Maver css inveszigated by a variety ot study gqrcups as 4 way 
of ensuring opjective and valid ¢vaiuations. This section 


ray) 

t+ 

fa 

ju 

ct 
o oO oO "Mm 
fu +4 


Memm@cnas Ss¢vetal of he factors contrisuting <=c ev 
su 


error, and the next sécrion discusses some methods 


iQ 

‘OQ 
il 
Ww) 
ct 


for réducing these errers. 


fee SOClal Interaction 


BOCiallanee lace on, Cmeas cradsito Dlas, is cften 
cited as a treason for avoiding peer evaluations. AS previ- 
cusly noted, this bias is schought by many organizaticns to 
adversely affect peer evaluaczions. This bias is also seen 
in supericr evaluations, but judging from the number of 
crganizations which use superior evaluators as a primary 
means of evaluation, the effects may not be considered as 
severe. This is not to say that superior evaluation biases 
are actually less severe than those dias¢s found in cther 


¢valuaticn scurces. The biases may be just as bad, but the 


37 





pre ctor'S ¢osition tends to lend a degree of credibility tc 


his or hér judgements, deserved or not. 


iD 


2. Evaluaze 


i+ 


et 
ii 


EFyalvuator 


(- 
=] 
(ta 


xperience and lack of training in ¢val- 
Peeeeeon precedures fend tO contribucte to haic and leniancy 
Pemors f{ Ref. 20}. Feorly derined measures force *+née inex- 
rerienced evaluator to make interpretations which, due to 
lamited kackground, may aot accurately reflect performance. 
Clesely associated with this idea is the evaluator's effec- 
tiveness cn the job. Low evaluator erfectiveness correlates 


Seeeongly with low evaluation accuracy. 


Meee meee cect) CON CiDINe ng to Svaluazo> ¢rrer is 


Moye, rels conflict exrerienced by many 2valuatozrs. Dayal has 
fe) 


Meee manager has to accept the responsibility =O judge 
Cag performances Ce ems COs. Gach {hss Teseensi- 
Deeielty 1S aAeSisantly taken becaus2 he feels uncenficr- 
mepode 2 hos roie as judgs. [Rer. 21:3p.29] 


Oitemweirect cf this evaluatcr discomfort is that svaluation 
results tend to group near the upper end cf the rating scale 
ie. Pine A DOSsilbdDle reasien for this afiect is that cw 
low Tatinacs may result in slower promotion or even firing of 
an amcloye¢s, for which the evaluator giving the ae ma y 
feel responsibie. fatings atthe high end of the scale 
meagues he probabillie that employees will experience lay- 
cffs cr slcwer promction and th2 evaluator will feel less 


respcnsitile if such actions do occur. 


4. Evaluator Knowledge of Evaluation Purpose 


ict 
{rs 





s previously staced, 


4 


ven (Ref. 11] has 
e 


She 
suggested that evaluator knewledge of the evaluation purpose 


38 





may ke ancther nenperformance factor influencing the actual 
perfortmanc2= rating received. A study by Gallagher { Ref. 22} 
investigated whether ratings of performance varied when 
evaluatcrs were given different purposes for the evalua- 
tions. The results support Scriven's contention. Gallagher's 

Giscussicn cf the crasults concludes "...that a single 
perfcermance evaluaticn should not be used for different 
purposes since the stated purpose of the ¢valuation can 


mmmeict the actual performance rating." (Ref. 22:p.38] 


Ge. ERRCR REDUCTION TECHNIQUES 


Many techniques are availarkle to help reduce evaluator 


es 
gated by varicus 


a eeha™ These «echnigues have been investi 
meeeeda.2¢n ceseaichers (€.g¢. Bernardin [ Ref. 23), Wiley and 
Jenkins [Ref. 24], and Scott [ Ref. 20] ) and some suggested 


{nh 
O 
a 
f= 
cr 
j. 
O 
J 
th 


arte presented here 


ae <2 oh se O28 Ow oe aw oD 


Bernardin, ina study of comprehensive vs. abbrevi- 
aced cvaluator training programs found chat evaluators 
Sere ral ned cn Error prior to observeéticn and who used the 
Beares tC Melntain chservational diaries had significantly 
Mesoeieniency ¢rror end ralo effect than ali other groups.” 
Meme 23¢p-302] In this study comprehensive training was 23 
Memeneur S<2©5S10n consisting of definitions, qraphic illus- 
<taticns and 2xamples of halo error, leniency and central 


tendency 


x 


as presented to studénts who were acting as evalu- 
ators of peer performance. The trainees were also given data 
to evaluate in terms cf the errors, and the evaluaticns were 
discussed. Abbreviated «raining was a five minute session 
with definitions cf the error types and a single 


Meeuctration of each. 


aS, 





The results cof this study indicated that the psycho- 
metric quality for those who underwent comprehensive 
training wes superior to those who received abbreviated 
Meaaning a*™the first rating period, and both training 
Meeps were” Superior to the control (untrained greup). 
Another rasult was that the positive effects of the training 
programs wera virtually nonexistent after one additional 
meena Cericd. [Reft. 23] One might argue that for an organ- 
Beech contemplating a training program for superviscr 
personnel the above infeormation may indicate that a compre- 


hensive training pregram would lead <tc fewer evaluator 


errors “han an abbreviated training program. As the 2ffects 
Seebeth “raining pregrams «ends to rapidly diminish with 
eine, hCWeVver, a srosy er Seo ee OL eMboe Ne 2eon lar) y 
administered may deli @ore positive etfects in <*he Lona 
runs 

2. [imensional Analysis 


¢ édLecussed previously, Treerent evaluation 


A 
scurces perceive perfern 


Gece Deck a cel ene ways, EO eccount 
EOL a subjective evaluaticn areas should be examined by 
dimensicnal analysis. This analysis is used tc investigate 
the many dimensions which compris? an evaluation area and 


considers the different combinations of dimensions used by 


Varicus evaluaticn ecurces. ince each evaluaticn zource 


ey 


memes tO use different dimensions in performing ¢valuaticns 
(Ref. 25:p.473], dimersional analy 
Mio, the parzicular concerns oa 


- 
I 
U1 
Q 
fu 
+) 


provide insigh= 
€é various sources. 
Klimcski and London [Ref. 17] prs 


h 

ent the example that 
Minate between items 
d 


S 
Superviscrs may be less able to discri 
related tc competence from those ralated to effort, whereas 
nurses razting themselves and peers can maéke that 
distinction. This weuld suggest that supervisors are more 


likely te censider effort as an indicator of competence than 


4 Q 





peers. By accounting for the dimensions used by various 
ch sources dimensional analysis can allow perforn- 


res to be tailored according to the anticipated 


=| 
m 
-- 9 
tH 
G 


cn source, crit may be used after the fact to help 
explain ratings received in particular areas in light of the 


evaluaticn ecurce. 


Se esting Evaluators 


3 


Wiley and Jenkins [ Ref. 24] had 109 Air Force navi- 
gator students estimate qualifications needed to perfcro 
various Air Force tasks using an experimentally standardized 
task list and sets of five rating scales. Their estimates 
Wer2 aggregated and a consensus or pooled ¢stimate group was 
formed. These students, after cnée month, again estimated 
qualifications and the students were scored by correlating 
their estimates with the key of pooled estimates. The study 
shows that ¢valuators who tend to aqree with the ccnseénsus 
aiso tend to retest self-agreemen=. These avaluatecrs also 
tend toward consensus agreement on later evaluations. 
fRef. 24) 

The above findings tend to suggest shat a standard- 
ized test cculd be developed to rate potential evaluators. A 
consensus key which corresronds to the organization's view 
cf performance weuld make it possinle to select evaluators 
wit corresponding views. This would help ansure 
organizational goals are being pursued by the evaluation 


process. 


4. Reducing Subjectivity of Evaluation Measures 


Ferformanc? appraisal systems are commonly regarded 
as being tcc subjective in nature, relying primarily on 
human judgement for gathering information pertaining to 
measures [Ref. 20]. BioMenateen Of all factors which can 


not ke objectively measured would naturally lead te minimal 


4 





emp qgectivity. While this elimination may or may not be 
1S possirle to develop a system where the eéval- 

vwator zeacts to stimuli which are relatively free of sukjec- 
tive or ixrelevant influences fcather than stimuli which 
require the evaiuatcr's judgement { Ref. 162p.89-92}. The 
Stimuli take the form cf actual on-the-job incidents which 
the evaluatcr simply cbserves without interpretation. These 
cidents, Sueatcaecca. Dehavyiors’, represent actions 
a associated with outstandingly successful ops 
outstandingly unsuccessful task performance. The evaluator 
in this srcle acts as a reporter of actions rather than a 


judge whe values acticns {[Ref. 20]. 


One problem associated with this method is the 
emomce Cf critical incidents or behaviors Some perscen or 
group of rprecple must be designated tc decide wh 


ace tc be used in evaluation. Providing a list 
incidents reduces the evaluator's need to exercise personal 


judgement in conducting evaluations. 


He SUMMARY 


This chapter has investigated the evaluator as part of 
+he scheme cf evaluation. The concepts of cbjectivity and 
validity were introduced and explained as they pertain to 
evaluaticn. Sources cf evaluator error were then discussed. 
Evaluater errors were divided into variable and constant 
errors, and each cf these areas was broken into specific 
error types. Various evaluator sources- superior, peer and 
disinterested party- were discussed with advantages and 
disadvantages of each source considered. A discussion of 
error sources, along with techniques to reduce these errors 
closes the chapter. The last section suggests that training 
and testing evaluatcrs and taking measures to reduce the 
subjectivity of evaluation measures can have a significant 


effect in reduction cf evaluator error. 


42 





The next chapter uses the information presented in 
@meoters Il and III to analyze the MCCRES and offer scme 
Bugieseicns for idertifying and controlling or controlling 


for petential evaluator bias. 


43 





I¥. MCCRES 


The furpcse of the Marine Corps Combat _ Readiness 
Evaluation System. (NCCRES) is to provide a imely and 
Bac irate. svaluaticn of the readiness of fleet ee ee 
FOrces, in pees Reserve units, +c accomplish assigned 
missichs. Ret. 26gper-A—1 | 


To achieve the cbjective of timely and accurate readiness 
evaluaticn, the MCCRES has bean designed to allow chserva- 
tion cf Marine units in simulated combat situations. Le 
promctes use of a standardized svaluation process and 


reporting system to provide feedback to the evaluated uni 


ie 


indicatine strengths and weaknesses ina combat readiness 
BPescuce. This chapter focuses on cth2 svaluation process in 
an attempt to identify areas where evaluators may ccmmit 
errors or inject bias into the evaluation oossibly leading 
tO inaccurate readiress fratings. The general evaluation 
appreach and structure of the MCCRES are discussed first 

followed by an investidqation of potential sources of error. 
The final section discusses some solutions to Minimize the 
effects of evaluator [Eias. 


A. APPROACH 


The MCCRES approach to evaluation nay be compared with 
the Prefessicnal Review (Accreditation) Approach fcrwarded 
by House [Ref. 1]. It iS an evaluation system cenceived 
Within the Marine Corps, graded by Marines and using stan- 
dards develcped by Marines. AS such, it clesely farallels 
the Frefessional Review Approach. In this approach, a 
particular professicn sets standards of performance for 
itself and conducts internal evaluations. Th2 reasoning for 
the internal evaluaticns is that members of that prcefession 


are ccnsidered experts in that field. 


44 





In chcosing evaluators to perform MCCRES evaluations, it 
is desireable that evaluators have recently served succéss- 
Boy ina billet relating to the function «they are to 
observe. This means, for example, that a Rifle Company 
evaluatcr should have recently served successfully as a 
Rifle Company commander. Successful recent biliet perforn- 
ance increases the frobability that evaluators will recog- 
nize adequate mission performance 


EB. STRUCTURE 


The MCCRES evaluation structura is a four-tiered hier- 
earchy as shcwn in Figure 4.1. Of particular importance to 
h 


x 


j/- 
faa 


ct 


Ss iscussion are the bottcm tzwo iayers--the Tactical 


Ry 
(Dp 
7) 


¢ 
arc 1 


2 Controller (TEC) aud #Gnhe Evalyuetors. “I= is hers 


that mission performance is observed, analyzed and reported. 


' 
7 
| 





pase 2 
| EVALUATION/EXERCISE COMMANDER | 
ee 





| 


[ EVALUATION/ EXERCISE DIRECTOR] 
Pag a 


ee. 


| TACTICAL EXERCISE CONTROLLER 
u Say ee 


| EVALUATORS | 


ia ce er ee ae a en encima app nme gene pacamipapemnpeagpatpepempassise na iupLagusp EN sip 





a a ee ee a O_o el ee 


. => 2m Se Veep SO ue EO ee SO cae OO eee 2 ee ee “- e 


Figure 4.1 MCCRES Evaluation Structure. 


45 





1. Tactical Exerei 
The TEC combviles and analyzes the results cf the 


Sedat tcns which have been submitted via the evaluator’ 


"i 


° 
data sheets and submits a formal report to the Exercise 
Director. Among the TEC's duties and responsibilities ars 
detertinaticn of specific Mission Performance Standards to 
be tested, extensive and detailed training of evaluators, 
develcpméent and contrel of intelligence play throughout the 
problem, and organization of the Tactical Exercise Centrol 
Group tc flan and ccnduct the @xercise. The TEC relies on 
the evalautors £9 report exercise progress and mission 
perfcrmance of the evaluated units. The forner information 
is rec2ived primarily via cadio communication while the 


Youle cOL Gare =sneects.. 


rae) 


Mees: arrives in the form cf 





Evaluators have three main roles in the MCCRES: 
Ue BeOrcase CCh=nOllsrs =¢ ensure the exer- 
clLse proceeds as pianned. 
pe Umpires to resolve disagreements between 
exercise and aggressor forces. 
ae Performances evaluators to observe task 


perfecrmance as related +o Mission Performance 
Standards being graded. 

NouaumeexerGice. contcsclier, evaluators work as an 
extensicn of the will of the TEC. They may increase or 
decrease the cperaticnal tempo of the problem ‘thrcugh the 
use cf such items as aggressor forces, intelligence reports 
Or simulated fires. They may create situations which require 
reaction ky the evaluated unit by insertion of prescribed 
events intc the play of the tactical problen. Action 
cbserved at this level is provided +o the TEC primarily by 
radio to assist the TEC in determining if the exercise pace 


1s satisfactcry. 


46 





AS umpires, é€valuators are tasked with resoluticna of 
disagreements which may occur between evaluated units and 


aggressor fcerces. For example, i1f an @valuated unit was 
k 


(drs 


ambushed by an §gresscr force, in evaluator woul 


Mm 


= 
ss 


Meer mination es £0 the ovtceme of the ambush and a 


in 
iD 


s¢ss 
casualties accordingly. 

In the role as performance evaluators, evaluatcrs 
cbserve unit performance cf prescribed tasks and make a2 
Meet minaticn as to the unit's ability to satisfactorily 
carry out the task. These determinations are recorded as 
"YES", "“NO" or “NOT APPLICABLE" marks on the evaiuator data 
S@eet. A Mark Of “YES” denotes that all facets of a partic- 
ular requirement were met. Conversely, a "NO" mark shews 
that all pertions cf a requirement ware not mat. Life yaw 
Meee act br areas are those nct *tested.czc which do net apoly 
to the scenario at hand. 

Having discussed the general roies of the evaluatcr, 
two tcpics are presented to help explain how ACCRES evalua- 
tors are organized ard what measures are us3d in makinga 
determinaticn of ccmbat readiness. mie fist, Senior 
Evaluatcrs, explains the duties and relationships of this 
MCCRES member to the rest of the evaluators. The second, 
Missicn Berfermance Standards, lcoks at the composition of 
the measures used in cenducting «he MCCRES. 


a. Senior Evaluators 


Fach unit evaluated has a senior evaluator who 
conducts a post exercise wrap-up and compiles «the data 
sheets from all subcrdinate evaluators. At this wrap-up, 
resolution of each "YES", "NO" and "NOT APPLICABLE" rating 
is made fcr each requirement tested. This resolution of the 
EGvaluator's data sheets results in "YES", MNO" cr "NOT 
APPLICABLE" ratings fer each requirement as it pertains to 


the entize wit. The senior evaluator provides his data 


47 





Saecct= tc <zhe TEC £cr compllation and further use by the 
TEC. An assessment cf "COMBAT READY" or "NOT COMBAT READY" 
for the entire unit is alsc also passed to the TEC by <«he 
senior evaluator. 

The senicr evaluator's relationship with cther 
evaluatcrs is a senicr-subordinate type. Senior by position 
and generally by nilitary rank, She senior avaluater is in 
charge of the evaluation team and is responsible fer ¢evalu- 
ating the performance of the entire unit opeing evaluated. 
The senicr evaluator is apctointed by name by the Exercise 
Director (an officer senior to the commander of the organi- 
gaticn being evaluated) andas such, maintains an indspen- 
dent relaticnshif Poet io Omtomeeazcd cl Onebeeng evaluated, Glher 
members cf the ¢valuation team, subordinate to the senior 


evaluatcr, are 


r 
4) 


Spensible fcr evaluating the subcrdinats 
Units (bcth crganic and attached) and other organizational 
Manet a2ons (such aS command and control and fire support 


coordinaticn) of the cverall unit being evaluated. 
b. Mission Ferrormance Standards 


Mission Féerformance Standards ({MPS's) aze stan- 
dards cf task performance used in MCCRES. Each standard is 
composed of various tasks. For example, the MPS Continuing 
Actions By Marines is composed of twelve tasks such as 
Discipline, Dispersion, Security and Casualty Handling. 
These tasks are further divided into conditions and require- 
ments. Conditions specify «he circumstances undér which 
requiremerts must be performed and provide recommendaticns 
to the ¢valuator concerning time and space limitations which 
May be impcsed on the evaluated unic. Requirements are 
specific actions which must be performed or behaviors which 
must te démonstrated in the accomplishment of a given task. 
The task Discipline, for instance, contains nine require- 


ments ranging from Self Discipline and Weapons Maintenance 


48 





Peeeitline’tc Hygenic Disciplize. Requirements which may 
need further information +o guide evaluators in the determi- 
Maticn cf satisfactcry performance ar2 provided with Key 
Indicators (KI's) of performance. <KI's aré an attempt to 
provide an cbjective foundaticn upon which to base an evalu- 
ator's judgement of satisfactory requirement performance. 
They should provide specific, measureable acticns or behav- 
iors which must be present for the requirement tc be 
successfully completed. 

Consider the KI for the requirement Weapons 
Maintenance Discipline. "Marines tak2 care to clean their 
wWeapcns, beth individual and crew served, daily. Weapons ate 
safeguarded. Care of weapons enforced by leaders." The KI 
Sells what is to be dene (clean weapons, both individual and 
crew served), when it is to be done (daily), who does it 
(Marines), and who supervises (leaders). KI's for cther 
requirements provide similar types of information to make 


requirements more objectively measureabie by the evaluator. 


C. PCTENTIAL PROBLEMS 


This section discusses the areas in which evaluators may 
inject Eias into the MCCRES. The discussion is presented in 
three parts: Senior evaluator influence, other evaluator 
Elias and MPS problems. Some general solutions to these prob- 
lems aré suggested here Wiel. Ose —SoecCitic solutaens 


presented in the follcwing section. 


fee Senior MEvailuavor influence 





Tke senior eévaluatcr can inject bias in two major 


ways. First, as the senicr member of the evaluation tean, 
he or she sets the tcene for the other evaluators. If the 
senior evaluator prejects a hard-iine, "by the book" 


approach toward the evaluation, evaluators may tend to view 
task requirements with litties flexibility. On the cther 


49 





hand, ina situation where the senior evaluator prejects 4a 
less rigcereus attitude toward the avaiuation, evaluators may 
tend to view task requirements less rigidly. As a result of 
evaluatcr perceptions of the senior evaluator's wishes, the 
evaluaticn delivered may be biased. 

The second major way in which the senior evaluator 
May inject bias is in the resclution cof other evaiuatorts 
ratings. These ratings ar@2 cbhtained from the deta sheets of 
the other evaluators. The senior evaluator depends upon the 
cbservaticns made by the other evaluators to provide data 
Which accurately reflects the performance of «he enti 


other evaluators' ccmpetence and on his own percepticn of 


z 
unit. Depending cn the senior evaluator's perceptions of the 
1 

euccessrul cask completicn, <he seniczs evaluator's data for 
Sa= TEC may Ce may not accurately reflecz= the overall unit 

abilities. As an example, Suppose an infantry batcalion 
Pomoueccd ar attack on an aggtessor force and that two of 
the ccmranies parformed extremely well while one company 
performed peorly. If, in the eanior evaluator's cpinion, 
the cffending company's performance was not critical to the 
entire unit's mission performance, a rating of "YES" could 
be delivered for the battalion for the task "ATTACK" as it 
pertains to the entire unit. [{Ref. 26:p.I-C-8] On the cther 
hand, if the senicr evaluator felt the one ccmpany's 
perfcrmance was such that it negated the accomplishments of 
the other two companies, 4 rating of "NO" could conceivably 
ke returned for the battalion for the task "ATTACK" as it 
pertains to the entire unit. The senior evaluator madé a 
decision based on personal judgement, possibly reflecting 


the unit's mission performance inaccurately. 


“ee Ocner: eVvalue tc lases 


da / SP Se a ine i 


The evaluatcrs who observe task performance and 


report to the senior evaluator are presented with a 





fecinuena CHopOErcCUunIity tO inject bias into the MCCRES. The 
discussicn cf the areas where thesé evaluators may inject 
foes is OfFgGanized in two groups: errers and evaluator 


SOUrcES. 
Ae ER ao. cS 


Evaluator bias manifests itself as any deviation 
from the objective "truth* concerning an e2val 
rerfcrmance. In this respect, bias may be fre 
peor, Of Leh.ency, strictness or hale effect. The first two 
errors result in ratings which ar2 respectively 
or tco "hard", while the last error tends *c cause ratings 
to grcup arcund one value cn the rating scale. ikfey asl ep 
erate, consider an evaluater cating the requirement 
Fauna mpgent Maintenance. The first sortion of the KI for this 
requirement states "Vehicies, genarators, eéetc., are given 
close attention by the Marines assigned to operate then." 
mecm., 26:p.1i-&-6] The ienient evaluator may ccnsider visual 
@eeeeveticn sach fctur hours constitutes close attenticn, 
whale a strict e¢@valuator considers maintenance conducted 
every other hour as an indicator of close attanticn. If a 
Marine is cbserved by these two evaluators checking his 
assigned equipment at strict four hour intervals because 
that is what the operating manual calls for, he will receive 
a different cating from eacn of the evaluators. ETiee hans 
case, the second evaluator has injected bias by committing 
eee elLror of strict néss. 

AS cdueelieictration of halo error, Suppese an 
evaluatcr is rating a unit on a task which contains five 
requiremen«s. At the outset of the observation period, the 
Mee Was Cazsticularly outstanding in carrying out the first 
requirement. Based upon the outstanding performance the 
evaluatcr expects Similar performance ror «he other 


requirements 9f the task. Such expectations may influence 


5) 





the ¢évaluator to0606Ud “see” == Oonly outstanding performance. 
Mistakes and poor performance aré viewed with the attitude 
that "...they ceally know better, they just weren't paying 
Bee ntion today...". Momeeeeesultecr ti2s attstude, a “YES" 
rating is délivered for the entire task, even though not all 
requirements were successfully completed. This evaluator has 
committed a halo errer since the rating has been influenced 
by the cutstanding performanc?: of only one regquirement of 
the entire task. It muse be noted that this error can also 
be observed in the cpposite sanse, that is a particulacrly 
bad ckséervation can bias the evaluator to view an entire 


task unfavorably. 
Ee. F¥valveticn Scurces 


In the previous discussion of the three main 
Sources cf evaluaticn--superior, peer and disinterested 
party--it was shown that the first two sources den e 
fairly ccmparabie errer introduction but may vary greatly 
PereeeretocrsS Ce S2sk pertormance. This difference in percer- 
fee ice telated t90 the dinensionality of <the task be 


evaluated. In the context of MCCRES this means that sure- 


u 
Tiors may net perceive task performance in the same way as 
peers. The last evaluation source, the disinterested party, 
Drings with it the potential preblenm of not understanding 
the process being graded. 

Many of the potential problems associated with 
various eévaluation scurces are diminished by two MCCRES 
Stipuliaticns concerning evaluators. The first stipulation 
is that é¢valuators should have recently served a successful 
tour in a billet related to the cne they are evaluating. A 
key werd in this stipulation is recently. Since ballats if 
the Marine Corps have ranks associated with then, the 
differential dimensicnality of senior and peer evaluators is 
dimited Ey ensuring evaluators have recently filled a billet 


Se 





Similar te the one they are evaluating. I~ Onlet  wWOeds, ar 
evaluator who has recently served in a billet sinilar tc the 
one he is evaluating is more likely to recognize those task 
dimensions which indicate successful task performance than 
an evaluater who has not recently held such a position. 
Besides the problems associated with differen- 
tial dimensionality between evaluation sources, secial 
interacticn between sources and the avaluated unit can be 
problematic. Both seniors and peers within an organization 
tend to interact in formal as well as informal ways. This 
informal cr social interaction may be carried into che eval- 
uaticn as a bias. iiem second Stapulation states "...1i* is 
desireable that evaluators be obtained 6 yi 
Mem@aras not directly cslated to ‘the srganization being 
Beeuated." fRet., 262p.i1-C-9] This may tesult in a reduc- 
Pome cL bias created by social interacticn. This reduction 
is due to decreased daily interaction between members of 
adjacent units as compared to daily interactions amcng 


mémbezs cf a Single unit. 


See wMass:on Pertcrmance Standard 


mW 





All of the evaluation sourcas nave one thing in 
common; they use the Mission Performance Standards to é¢vali- 
vate unit ccmbat readiness. A potential probiem associated 
With the MPS'S is their subjectivity. This subjectivity 
permits evaluatcr interpretation of standards which may 
result in Eiased evaluations. 

To determine the extent of the MPS's subjectivity, 
the requirements fcr the Peo SeeCOMc2InUing Actions @By 
Marines, Command And Control and Fir2 Support Coordination 
weres examined. The criterion used to determine <the 
subjectivity of a requirement was the ability of the 
requirement *o be quanti fied. It the requirement was 


expressed in terms which are physically measureable, such as 


53 





limits of time or distance, then it was considered objective. 
Requirements containing phrases which require interpretation 
Hyecee €@Valuacor, such as "...close attention...", wer 

considered subjective. The meaning of these requirements can 
= upon theevaluator's interpretation of the require- 
ment's wording. 

Cf the 243 reguirements for the above MPS's, 15 were 
found to be susceptitle to evaluator interpretation. This is 
appreximately 6.2 percent of the requirements for these 
three MPS's. These 15 réquirements contain phrases such as 
MeeeclOoce attention..." or "...processed with speed..." *o 
describe satisfactory requirement performance. Without cléa 
guidance as to what CGOonstltuces "close attention" or 
processing "with speed", different evaluators may interpret 
the requirement to have differant meanings. This difference 

terpretation méans that two evaluators observing 2 

Bose aicular requireprent being performed Gould return 

differant ratings of requirement vpartformance, depending on 

how the recuirement is interpreted. For each of the 15 

zsequirements the requirement number and the subjective 
Lae 


Bemaec CCnptained in the requirement is listed in Tab 


D. POTENTIAL PROBLEMS PERCEIVED BY FIELD USERS 


Six Marine cfficers attending the Naval Postgraduate 
School were interviewed *o gain an insight into potential 
MCCRES protlems as perceived by users in the field. Th¢ six 
officers ranged in grade from 0-2 to O-4 and represented 
MOS's 0302 (Infantry Officer) \202meesngi neces Officer) 7562 
(Pilct AMM CH-46) and 7587 (Airborne Radar Intercept 
Pers Cer , FUN/J/S) . The interview consisted of three 
guesticns: 

Bic Do you feel that an evaluator can affect a 
MCCRES evaluation through personal bias? 


2. How is this bias input? 


54 





3. In what areas do you feel dias is most iikely <to 

e(elehey 
The results of these interviews demonstrated that there was 
close agreement cn each of the questions across both MOS and 
grade. All interviewees felt that an evaluator could affect 
a MCCRES evaluation through personai bias. This bias was 
seen as being input through evaluator interpretation of 
performance criteria. These criteria take the form of task 
requirements. Responses to the last question indicate field 
users felt bias is most likely to occur in those area 
which numerical meastres are not easily attached. The 
areas which lend themselv3s to quantifiable neasuren 


less likely to contain evaluator bias taan non-quanti 
¥ g 


ty 


Ee 


sv 


es 


TApie Lt 
MES Requirements Susceptible to Evaluator Bias 


i 
ct 


Regui 


ee 
1 
=| 
tox 
iD 
tt 
9) 
fe 


ROR 4H 
tWWJILIUIRK) ROR DOA) ata of lf 


toll lt te = os = Des re 1)) 
‘J 
6.66 


e 
cools 


Saat ee 

ns use to a mininmun" 
"GOrosc material Ssate- 
quazded" 

processed with speed" 
MORO Cea Warn Security" 
"Safeguards classified 
face riaw: 

"neat and orderly" 
"dispersed to reduce 
vulnerability" 
"ilspersed”™ | 

NGwese.iy monitors" 
"+imely manner" 
Haceuwrace opLots" 
closely Monitors" 


Hon Fujy 


PORIR) 
ee@¢ & 
e® @ @ 


UNIS Ea OO WDsjen Dera 
oe e@ 8 G&G e & 
~JdlJ ~JUTe 


DOROKRY 

e eo» 4 
ee (8 

@ 8 a. 
HUI hobo 
ned 

Jos 


DPONIDIVIKR? WIKI 
ee 
eo 6 

wn 


@ee#¢ 8 
eo %® @ 6 6 


Ccmparison of potential problems With ICC RES as 
perceived by the sample of field users to the pcetential 
problems cutlined in the previous section shows that the 
field users' perceptions are a subset of the potential preb- 
lems disccvered through analysis of the MCCRES. 


55 





E- RECCMMENDED SOLUTIONS 


The preblems discussed in the previous two sections 
demonstrate the varicty of ways in which an evaluator may 
mecroduce bias into a MECRES. In order to mini 
input, three possitie solutions to the bias p 
forwarded. These solutions are evaluator training, 


A 
testing ard quantification of subjective MPS requireme 
1, Evaluator Iraining 


As previcusly noted, ¢valuator training has proved 
to be an effective tool in reduction of VeiMator Slr ors 
Bernardin [ Ref. 23] showed that evaluators receiving compre- 
hensive training shew greater error reduction results than 


evamiia. cls receiving limited training. Both cf 


ct 


these groups 
show less error than evaluators who have received no 
maa ning. 

Current VEGhE Ses eondascs task Boe ie With 
conducting ¢xtensive and detailed training of evaluators. In 
the eéxperience cf several officers attending the Navai 
Fostgraduateé School, who were questioned concerning evalu- 
ator training, this training is geared toward educating the 
evaluator on the exercise scenario with ne specific mentior 
of the errors which ¢valuators ‘*typically commit. By making 
MCCRES evaluators aware of the errors typically committed by 
evaluators, the MCCRES evaluators are less likely to commit 
these errers, reducing biased input. An evaluator training 
package addressing bcth scenario development and possible 
evaluator error should be created to more fully exploit the 
potential cf comprehensive evaluator training outlined by 
Fernardin { Ref. 23}. 

Another aspect of evaluator training is ensuring 
potential evaluators are well-versed in the areas they are 
chosen tc evaluate. Choosing knowledgeable evaluators tends 


Se 





£0 increase the probability that those factors which indi- 
cate successful task performance are considered during the 
evaluaticn. 

Cne method to ensure trained, knowledgeable avalua- 
fons tor MCCRES evalvations is formation of 2 formal MCCRES 
evaluation team. By choosing team members who have démons- 
trated proficiency in their MOS's and keeping them current 
in both <+heir MOS's and evaluation <technigues through 
*raining, a skilled cadr2 of evaluators can be assembled. 

Scmé of the advantages of tieorming a formal MCCRES 
evaluaticn team are minimization of evaluator training 


CcOSts, fFinimization of social interaction with evaluated 


units and a more standardized evaluation base. Evaluator 
Seeiniag Costs ate fFinimized sitcs the Sans evaluators are 
requently used. Peougier saeneng SheectS diminish fracidiy 


n f 

with time, retraining for seach successive evaluation could 
medeo Wa Learnaag Curve, reducin costs over <‘+ime. 
Bema l 2nteracticn if minimized due to lower daily contact 
fies =ValvUetors, as copesed *0 the interaction which cccurs 
among adjacent ccmmands. The last factor, standardization of 
the evaluation tase, results from the continuity of the 
formal evalvation tean. 

A MCCRES evaluation team could be compcesed of 
Pemeennel frem untts such as Division Schools, or it could 
reside cutside the active duty forces ata Reserve unit, 
Since the MCCRES is to evaiuate both active and reserve 
forces. Having reserves evaluate MCCRES would also offer the 
additicnal renefit cf keeping the reserve uv to date and 
strengthening the tie between active and reserve ferces in 


the Marine Ccrps. 


2. Evaluator Testing 





Evaluator testing can be seen as a method of both 


Sellecrellainge and centrolling for evaluator bias. In the 


37 





former case, a test can be constructed which would indicate 
the aréas in which a prospective evaluator demonstrates 
bias. By testing a number of these prospective evaluators, 
piese Wic demonstrate Little ot no bias could be chosen to 
Gonduct *MCGRES evaluations, thereby @Manimeizing the likeli- 
hood of evaluator bias input. For instance, consider a te 
in which evaluatcrs are graded according to their aareement 
with an answer key. Further, suppose the answer key is 
composed cf the pocled answers of a Beoupeotr  "ugwbiased" 
evaluators. As suggested by Wiley and Jenkins 
(Ref. 24:sp.217}, evaluator agreement witn the key can be 
used tc predict the likelihoed of evaluatcr bias. These 
evaluators showing close agreement with the key of "unkbi- 
ased" answers can be chosen to perform evaluations. 

The same test, analyzed differentiy, can be used to 
contrcl for evaluator bias. For instance, the results of the 
test are analyzed tc discover in which areas an evaluator's 
Peeeeseescst. From this analysis a “bias profiie" cculd be 
constructed which cculd allow evaluation results te be 
"Standardized". For ¢xample, assume a MCCRES evaluator's 

tas profile showed significant daviation toward strictness 
Smeets area Of discipline. During the conduct of a MCCRES 
evaluation a senior evaluator notes this evaluator's data 
sheet has a "NO" rating for many of the requirements cf the 
task DISCIPLINE. The senicr evaluator, Knowing that this 
Seeuatecr <=endS *O6 Ee ovdarticularly strict in evaluating 
discirline, may wish to obtain additional performuance infer- 
mation concerning the unit evaluated, since the evaluators 
ratings may not accurately reflect the unit's actual 


perrormance. 
oe ecuanwetacattchn OF MES*s 


The last method of controlling evaluator bias is 


quantification ee subjective MPS requirements. Thi 


UV) 


58 





Guemtartcaticn, as Scotem{ Ref. 20} suggests, reduces the 
evaluator's task from interpreting MPS requirements and 
comparing task perfcrmance with this interpretation to 
reporting whether task performance mests tne requirements. 
For example, instead cf trying te decide how fast the phrase 
Ww ..-process with speed..." is, reporting whether the unit 
was able tc "...process within two hours..." is less open to 
interpretation. The more cencrete the requirement, the less 
evaluator interpretation that will take place in grading, 
resulting in reduced evaluator bias. Some of the quantifica- 
tions may be less concrete than others. Some requirements 


May be ccenstructed in terms of ranges of acceptable perfora- 


fees tor Gitfering tactical scenarics. Sill, the ranges 
serve to tound the amcunt of interpretation reguired by the 
evaluatcr. 


Fe CCNCIUSIONS 


In the intreducticn of this paper two guestions are 
Memed. ithe first a@sSkKS i= factors of the MCCRES esvaiuation 
WMmmen a5 SUDJECc tO EVaiuator bias can be identified, and 
the second asks how these factors can be contrclled or 
contrelled fer. It has been Shown that areas in which evalu- 
ators may tias the MCCRES can be identified and comprise 
three basic areas: senior evaluator influence, other evaiu- 
ator bias and MPS interpretaticn. 

MemncmmMcuNogs Gl Controlling 92 conzsolling fcr these 
mace Ors, three techniques were Forwarded: evaluator 
training, ¢valuat*tor testing and quantification of subjective 
MPS requiremenzs. Each of these techniques has potential for 


SGemcrciiing bias. 


Ge. RECCMMENDATIONS FOR FUTURE RESEARCH 


Discussicn of the propcesed solutions to the problem of 


evaluator ktias did not address the cost to implement the 


32, 





solutions. A study cf benefits and costs for each cf the 
soluticns would provide additional information as +o the 
feasibility of the sclutions. In addition, a detailed study 


cf tke propesed solutions would be likely +0 point cut 


several methods of implementation fez each; possibly 
revealince still other solutions not addresse in. Shas 
thesis. 


60 





10. 


ie a 


TZ). 


LIST OF REFERENCES 


lewecan, S. gAS and G. Wurzburg, Evaluating Federal 
Social Programs, W. E. Upjehn Institute for Empl cynen= 
Réseatch, W779. 

Reiken, 4H. Ween ce On fOr What? A Critique of 
Evaluative Resé¢arch," in Evaluatin Action Programs: 
Readings in Sccial Action and “Educart 1 Onegmed. acOL dH. 
Weiss, Allyn atd Bacon, ; 





Sautrenoan, Da Low. Legholey, W. J. Gephart,sF. G. 
Gilead “heal. sation Of Merriman, and’ uM. “S. 
Prevus, Educaticnal Evaluation and Decision Making, F. 
E. Peacock Publishers, Inc., T97T. 

Redea<on,  S.-. Ba, ard. 5S. Ball, The Profession and 
Practice of Frogram Evaluation, Jossey-Bass Inc. 
Publishers, 197E0 

Tracey, W. &., Evaluazing Training and Develormen® 
Systems, American Managéenent ASsociation, 1968. 
Langstcen, J. H., “OEO Neighborhcod Health Centers: 
Evaluation Case Study", in. Social Experiments and 
SOCidi £reqram 2valua = eds. J. GY peastueand ws. 


sGopoun 
aAgLtass, pelUT=-I2Z7, Saitinger, 1974. 


A 


t Re Wiev, 


Beueker, PP. Fe, Th 
Brcthers, 1954. 


Shu Eapeand Aees Fs. Kindall, "Management by 

oe fives: Where We Stand- A Survey of the Fertune 

jue " a<eu 
of 


l Human Rescurce Management, v.13, no. 1,5pring 
Scriven, +. Goa loerree jevaluation," in  “Scehcol 
ENopuacluOmeecas ho k. House, WeCucchan, 1973. 
case ¢ erg he Educational Imagination, McMillan, 


61 





pee 


14, 


ee 


16. 


17. 


18. 


io. 


Z0< 


PAVE 


Zu. 


WOneey se fh steals. "The Use of Judicial Evaluation Metheds 
in the Formulation of Educationai Policy," Educational 
Evaluetion and Policy Analysis 1, May-June T97@. 


Barrett, R.- S., Performance Rating, Science Reseatch 
Asscciates, 1966. 

Scriven, Me» me ety Soe Ub geCcCiVity 25 
BeuGatioial REeSecaner, * In Ph? osophecat Redirection of 
Pomeacionat Reccamcig ed. Lo 7G.) Thomes, National 
Beceety Lor tChecmescuay of Education, 1972. 

Cummings, Le. L., and D. P. Schwab, Performance In 
Organizations, Scott, Foresman and Company, T973. 


Cenc G.eeana MN. London, “Role of the ka 
Performance APE e tee gs yooaase of Applied Psychology, 
, 


Vou. ae re. pe 445- Ne V7 e 

Love, Kk. G., “Comparison of Peer Assessment Metacds: 
Rewoacel ty, Wet dey caendship Sees, .and User 
Reaction," Journal of Applied Psychology, voli. 66, no. 
G, ce. 451-457, 7° 1981. 


Reeetoaci wenn Lege  Aaccresias tn Fertormance Hazings: 
SOG 1OngeseclLi-, and Peer faa 3s UN JOUeHa sO: Aociied 
Psychology, vol. 63, no. 5y p. 579-538, T1978. 


SCote Rommel ., “Taking Subjestlvic 
Appraisal," Personnel, p. 45-49, J 


Dayal, I., “Some Issues in Performance Appraisal," 
Personnel Administration, Dez, Jarhiary—rebruary 


> 2 ee SS Se SS CB ES SS Se SES 


Gallagher, 4. Ce, PoaOre ewe tas Eee: 6 Veloce ace 
Bvaditatton-s", Eerscnnel, Dp. 35-40, July~Augus< 1978. 


eb = eee: a Se ee) ce 


Bernardin, oe) ar "Effects of Rater Training on 
Pent ee an d aia te ee de Se a aha Bs Paws aoe 
netructors ebe velar, Le sycnoliog Wigs 

nc. 3, p. 301-308, 1978. ee ee / 
Weeucy,. Les, and W. Jenkins, "“Selectin Competent 
Raters," Journal of Applied Psycholoay, vol. 48, no. 
4, E. 215=217, 1964. ae 


62 





6 3 





McGraw-Hili, 


December, 


1965. 


1977. 





INITIAL DISTRIBUTION LIST 


trginia 22314 


e 
n 
a 
ee Code 01742 
cstgraduate Scho 
ey, Calirornia 93 
ant Professecr Ken 
Tt 
e) 


st 
aes Of Adminis 


Or =a Tart HOW 


Oe c+) p14 
rt) 
t{ 
> 
med 
* 


i 
c “5589 
4a hs Corps Representative 
Na Dest tqradua ts, Senae. 

e r33 Sali foznia 333940 


uter Technoicgy Proagrens 
 Postqpaduate Scnool 
- CaeE.rornia 393940 


a 
Quarters Marine Corps 
PNGe On, eve Ce, 20260 
a 
le 
€ 


2m RMaNMO @Beuowm ww 
AMO NN|OO AWAD 


sO) 
vw OM 


64 


nee Mirorles Ch Cantar 


- Euske, 
Sciences 


Mullane, 


Sciences 


USMC 


NO. 


Code SUEE 


USMC 


ndant of the Marine Coros (Code POR) 


Cepies 














Thesis 2016394 
W4846 Wheeler 

c.l Evaluator bias in the 
Marine Corps combat 
readiness evaluation 
system (MCCRES) its 
identification and con- 
eoee) Ee 


————————" 





